Polyhedral parallel code generation for CUDA
نویسندگان
چکیده
منابع مشابه
AutoGPU : Automatic Generation of CUDA Kernel Code
Manual optimization of a CUDA kernel can be an arduous task, even for the simplest of kernels. The CUDA programming model is such that a high performance may only be achieved if memory accesses in the kernel follow certain patterns; further, fine-tuning of the kernel execution and loop configuration may result in a dramatic increase in performance. The number of possible such configurations mak...
متن کاملPolyhedral Code Generation in the Real World
The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular access patterns, its applicability which was supposed to be limited to very simple loop nests has been extended...
متن کاملOn Code-Generation in the Polyhedral Model
Automatic parallelization in the polyhedral model is based on aane transformations from an original computation domain (iteration space) to a target space-time domain, often with a diierent transformation for each variable. Code generation is an often ignored step in this process that has a signiicant impact on the quality of the nal code. Previous code generation methods are based on loop spli...
متن کاملAutomatic C-to-CUDA Code Generation for Affine Programs
Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicate...
متن کاملInterprocedural Transformations for Parallel Code Generation Interprocedural Transformations for Parallel Code Generation
We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two inter-procedural transformations that move loops across procedure boundaries, exposing them to traditional optimizations on loop nests. These transformations are incorporated into a code generation algorithm for a shared-memory multiprocessor. The code gene...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Architecture and Code Optimization
سال: 2013
ISSN: 1544-3566,1544-3973
DOI: 10.1145/2400682.2400713